Mixed Media Reality Recognition with Image Tracking

ABSTRACT

An MMR system integrating image tracking and recognition comprises a plurality of mobile devices, a pre-processing server or MMR gateway, and an MMR matching unit, and may include an MMR publisher. The MMR matching unit receives an image query from the pre-processing server or MMR gateway and sends it to one or more of the recognition units to identify a recognition result. Image tracking information also is provided for determining relative locations of images to each other. The mobile device includes an image tracker for providing at least a portion of the image tracking information. The disclosure also includes methods for image tracking-assisted recognition, recognition of multiple images using a single image query, and improved image tracking using MMR recognition.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a division of U.S. patent application Ser. No.12/247,205, titled “Mixed Media Reality Recognition with ImageTracking,” filed Oct. 7, 2008 which is a continuation in part of U.S.patent application Ser. No. 11/461,017, titled “System And Methods ForCreation And Use Of A Mixed Media Environment,” filed Jul. 31, 2006;U.S. patent application Ser. No. 11/461,279, titled “Method And SystemFor Image Matching In A Mixed Media Environment,” filed Jul. 31, 2006;U.S. patent application Ser. No. 11/461,286, titled “Method And SystemFor Document Fingerprinting Matching In A Mixed Media Environment,”filed Jul. 31, 2006; U.S. patent application Ser. No. 11/461,294, titled“Method And System For Position-Based Image Matching In A Mixed MediaEnvironment,” filed Jul. 31, 2006; U.S. patent application Ser. No.11/461,300, titled “Method And System For Multi-Tier Image Matching In AMixed Media Environment,” filed Jul. 31, 2006; U.S. patent applicationSer. No. 11/461,126, titled “Integration And Use Of Mixed MediaDocuments,” filed Jul. 31, 2006; U.S. patent application Ser. No.11/461,143, titled “User Interface For Mixed Media Reality,” filed Jul.31, 2006; U.S. patent application Ser. No. 11/461,268, titled “AuthoringTools Using A Mixed Media Environment,” filed Jul. 31, 2006; U.S. patentapplication Ser. No. 11/461,272, titled “System And Methods For CreationAnd Use Of A Mixed Media Environment With Geographic LocationInformation,” filed Jul. 31, 2006; U.S. patent application Ser. No.11/461,064, titled “System And Methods For Portable Device For MixedMedia System,” filed Jul. 31, 2006; U.S. patent application Ser. No.11/461,075, titled “System And Methods For Use Of Voice Mail And EmailIn A Mixed Media Environment,” filed Jul. 31, 2006; U.S. patentapplication Ser. No. 11/461,090, titled “System And Method For UsingIndividualized Mixed Document,” filed Jul. 31, 2006; U.S. patentapplication Ser. No. 11/461,037, titled “Embedding Hot Spots InElectronic Documents,” filed Jul. 31, 2006; U.S. patent application Ser.No. 11/461,085, titled “Embedding Hot Spots In Imaged Documents,” filedJul. 31, 2006; U.S. patent application Ser. No. 11/461,091, titled“Shared Document Annotation,” filed Jul. 31, 2006; U.S. patentapplication Ser. No. 11/461,095, titled “Visibly-Perceptible Hot SpotsIn Documents,” filed Jul. 31, 2006; U.S. patent application Ser. No.11/466,414, titled “Mixed Media Reality Brokerage Network and Methods ofUse,” filed Aug. 22, 2006; U.S. patent application Ser. No. 11/461,147,titled “Data Organization and Access for Mixed Media Document System,”filed Jul. 31, 2006; U.S. patent application Ser. No. 11/461,164, titled“Database for Mixed Media Document System,” filed Jul. 31, 2006; U.S.patent application Ser. No. 11/461,024, titled “Triggering Actions WithCaptured Input In A Mixed Media Environment,” filed Jul. 31, 2006; U.S.patent application Ser. No. 11/461,032, titled “Triggering ApplicationsBased On A Captured Text In A Mixed Media Environment,” filed Jul. 31,2006; U.S. patent application Ser. No. 11/461,049, titled “TriggeringApplications For Distributed Action Execution And Use Of Mixed MediaRecognition As A Control Input,” filed Jul. 31, 2006; U.S. patentapplication Ser. No. 11/461,109, titled “Searching Media Content ForObjects Specified Using Identifiers,” filed Jul. 31, 2006; U.S. patentapplication Ser. No. 11/827,530, titled “User Interface ForThree-Dimensional Navigation,” filed Jul. 11, 2007; U.S. patentapplication Ser. No. 12/060,194, titled “Document-Based Networking WithMixed Media Reality,” filed Mar. 31, 2008; U.S. patent application Ser.No. 12/059,583, titled “Invisible Junction Feature Recognition ForDocument Security Or Annotation,” filed Mar. 31, 2008; U.S. patentapplication Ser. No. 12/060,198, titled “Document Annotation Sharing,”filed Mar. 31, 2008; U.S. patent application Ser. No. 12/060,200, titled“Ad Hoc Paper-Based Networking With Mixed Media Reality,” filed Mar. 31,2008; U.S. patent application Ser. No. 12/060,206, titled “IndexedDocument Modification Sharing With Mixed Media Reality,” filed Mar. 31,2008; U.S. patent application Ser. No. 12/121,275, titled “Web-BasedContent Detection In Images, Extraction And Recognition,” filed May 15,2008; U.S. patent application Ser. No. 11/776,510, titled “InvisibleJunction Features For Patch Recognition,” filed Jul. 11, 2007; U.S.patent application Ser. No. 11/776,520, titled “Information RetrievalUsing Invisible Junctions and Geometric Constraints,” filed Jul. 11,2007; U.S. patent application Ser. No. 11/776,530, titled “RecognitionAnd Tracking Using Invisible Junctions,” filed Jul. 11, 2007; U.S.patent application Ser. No. 11/777,142, titled “Retrieving Documents ByConverting Them to Synthetic Text,” filed Jul. 12, 2007; U.S. patentapplication Ser. No. 11/624,466, titled “Synthetic Image and VideoGeneration from Ground Truth Data,” filed Jan. 18, 2007; U.S. patentapplication Ser. No. 12/210,511, titled “Architecture For Mixed MediaReality Retrieval Of Locations And Registration Of Images,” filed Sep.15, 2008; U.S. patent application Ser. No. 12/210,519, titled “AutomaticAdaption Of An Image Recognition System To Image Capture Devices,” filedSep. 15, 2008; U.S. patent application Ser. No. 12/210,532, titled“Computation Of A Recognizability Score (Quality Predictor) For ImageRetrieval,” filed Sep. 15, 2008; U.S. patent application Ser. No.12/210,540, titled “Combining Results Of Image Retrieval Processes”filed Sep. 15, 2008; and is related to U.S. patent application Ser. No.12/240,596, titled “Multiple Index Mixed Media Reality Recognition UsingUnequal Priority Indexes,” filed Sep. 29, 2008; all of which areincorporated by reference herein in their entirety.

FIELD OF THE INVENTION

The techniques disclosed here relate to indexing and searching for mixedmedia documents formed from at least two media types, and moreparticularly, to recognizing images and other data using multiple-indexMixed Media Reality (MMR) recognition that uses printed media incombination with electronic media to retrieve mixed media documents.

BACKGROUND

Document printing and copying technology has been used for many years inmany contexts. By way of example, printers and copiers are used incommercial office environments, in home environments with personalcomputers, and in document printing and publishing service environments.However, printing and copying technology has not been thought ofpreviously as a means to bridge the gap between static printed media(i.e., paper documents), and the “virtual world” of interactivity thatincludes the likes of digital communication, networking, informationprovision, advertising, entertainment and electronic commerce.

Printed media has been the primary source of communicating information,such as news papers and advertising information, for centuries. Theadvent and ever-increasing popularity of personal computers and personalelectronic devices, such as personal digital assistant (PDA) devices andcellular telephones (e.g., cellular camera phones), over the past fewyears has expanded the concept of printed media by making it availablein an electronically readable and searchable form and by introducinginteractive multimedia capabilities, which are unparalleled bytraditional printed media.

Unfortunately, a gap exists between the electronic multimedia-basedworld that is accessible electronically and the physical world of printmedia. For example, although almost everyone in the developed world hasaccess to printed media and to electronic information on a daily basis,users of printed media and of personal electronic devices do not possessthe tools and technology required to form a link between the two (i.e.,for facilitating a mixed media document).

Moreover, there are particular advantageous attributes that conventionalprinted media provides such as tactile feel, no power requirements, andpermanency for organization and storage, which are not provided withvirtual or digital media. Likewise, there are particular advantageousattributes that conventional digital media provides such as portability(e.g., carried in storage of cell phone or laptop) and ease oftransmission (e.g., email).

One particular problem in the prior is that the image recognitionprocess is computationally very expensive and can require seconds if notminutes to accurately recognize the page and location of a pristinedocument from an input query image. This can especially be a problemwith a large data set, for example, millions of pages of documents, orfor mobile devices with a large latency or limited bandwidth connectionto an MMR server. Thus, there is a need for mechanisms to improve thespeed in which recognition can be performed.

The process of image tracking, or finding relative correspondencebetween multiple images of the same object taken from different camerapositions and possibly under different illuminations, is known in theart. Basic image tracking can find corresponding points in two images,and advanced tracking can use this information to determine cameraposition and movement. However, to date, image tracking has not beenused to improve speed and accuracy in document recognition.

SUMMARY

The techniques disclosed here overcome the deficiencies of the prior artwith an MMR system combined with image tracking functionality. Thesystem is particularly advantageous because it provides faster and/ormore accurate search results. The system is also advantageous becauseits unique architecture can be easily adapted and updated.

In one embodiment, the MMR system comprises a plurality of mobiledevices, a computer, a pre-processing server or MMR gateway, and an MMRmatching unit. Some embodiments also include an MMR publisher. Themobile devices are communicatively coupled to the pre-processing serveror MMR gateway to send retrieval requests including image queries andother contextual information. The pre-processing server or MMR gatewayprocesses the retrieval request and generates an image query that ispassed on to the MMR matching unit. Image tracking information is alsoprovided prior to recognition, e.g., by an image tracker on the mobiledevice and/or by a tracking manager on the server side. The MMR matchingunit includes a dispatcher, a plurality of recognition units, and indextables, as well as an image registration unit. The MMR matching unitreceives the image query and identifies a result including a document,the page, and the location on the page corresponding to the image query.The MMR matching unit includes a tracking manager for providing imagetracking information and combining images and/or features according tovarious embodiments. A recognition result is returned to the mobiledevice via the pre-processing server or MMR gateway.

The present disclosure also includes a number of novel methods includinga method for image tracking-assisted recognition, recognition ofmultiple images using a single image query, and improved image trackingusing MMR recognition.

The features and advantages described herein are not all-inclusive andmany additional features and advantages will be apparent to one ofordinary skill in the art in view of the figures and description.Moreover, it should be noted that the language used in the specificationhas been principally selected for readability and instructionalpurposes, and not to limit the scope of the inventive subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

The techniques disclosed here are illustrated by way of example, and notby way of limitation in the figures of the accompanying drawings inwhich like reference numerals are used to refer to similar elements.

FIG. 1A is a block diagram of one embodiment of a system of mixed mediareality using multiple indexes.

FIG. 1B is a block diagram of another embodiment of a system of mixedmedia reality using multiple indexes.

FIG. 2A is a block diagram of a first embodiment of a mobile device,network, and pre-processing server or MMR gateway.

FIG. 2B is a block diagram of a second embodiment of a mobile device,network, and pre-processing server or MMR gateway.

FIGS. 2C-2H are block diagrams of various embodiments of a mobile deviceplug-in, pre-processing server or MMR gateway, and MMR matching unitshowing various possible configurations.

FIG. 3A is a block diagram of an embodiment of a pre-processing server.

FIG. 3B is a block diagram of an embodiment of an MMR gateway.

FIG. 4A is a block diagram of a first embodiment of a MMR matching unit.

FIG. 4B is a block diagram of a second embodiment of the MMR matchingunit.

FIG. 5 is a block diagram of an embodiment of a dispatcher.

FIG. 6A is a block diagram of a first embodiment of an image retrievalunit.

FIG. 6B is a block diagram of a second embodiment of the image retrievalunit.

FIG. 7 is a block diagram of an embodiment of a registration unit.

FIG. 8 is a block diagram of an embodiment of an image tracker.

FIG. 9 is a flowchart of an example method for retrieving a document andlocation from an input image.

FIG. 10 is a flowchart of an example method of image tracking-assistedrecognition.

FIG. 11 is a flowchart showing an example method of recognition of aplurality of received images using a single image query.

FIG. 12A shows conversion of four received images into a merged imageaccording to the method shown in FIG. 11.

FIG. 12B shows a cropped high resolution image result according to ahand-held scanner functionality.

FIG. 13 is a flowchart showing a method of improved image tracking usingMMR.

DETAILED DESCRIPTION

An architecture for a mixed media reality (MMR) system 100 capable ofreceiving the query images and returning document pages and location aswell as receiving images, hot spots, and other data and adding suchinformation to the MMR system is described. In the followingdescription, for purposes of explanation, numerous specific examples areset forth in order to provide a thorough understanding of thedisclosure.

Reference in the specification to “one embodiment” or “an embodiment”means that a particular feature, structure, or characteristic describedin connection with the embodiment is included in at least one exampleembodiment. The appearances of the phrase “in one embodiment” in variousplaces in the specification are not necessarily all referring to thesame embodiment. In particular the present disclosure includes examplesbelow of two distinct architectures and some of the components areoperable in both architectures while others are not.

Some portions of the detailed descriptions that follow are presented interms of algorithms and symbolic representations of operations on databits within a computer memory. These algorithmic descriptions andrepresentations are the means used by those skilled in the dataprocessing arts to most effectively convey the substance of their workto others skilled in the art. An algorithm is here, and generally,conceived to be a self consistent sequence of steps leading to a desiredresult. The steps are those requiring physical manipulations of physicalquantities. Usually, though not necessarily, these quantities take theform of electrical or magnetic signals capable of being stored,transferred, combined, compared, and otherwise manipulated. It hasproven convenient at times, principally for reasons of common usage, torefer to these signals as bits, values, elements, symbols, characters,terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar termsare to be associated with the appropriate physical quantities and aremerely convenient labels applied to these quantities. Unlessspecifically stated otherwise as apparent from the following discussion,it is appreciated that throughout the description, discussions utilizingterms such as “processing” or “computing” or “calculating” or“determining” or “displaying” or the like, refer to the action andprocesses of a computer system, or similar electronic computing device,that manipulates and transforms data represented as physical(electronic) quantities within the computer system's registers andmemories into other data similarly represented as physical quantitieswithin the computer system memories or registers or other suchinformation storage, transmission or display devices.

The present disclosure also describes an apparatus for performing theoperations disclosed herein. This apparatus may be specially constructedfor the required purposes, or it may comprise a general-purpose computerselectively activated or reconfigured by a computer program stored inthe computer. Such a computer program may be stored in a computerreadable storage medium, such as, but not limited to, any type of diskincluding floppy disks, optical disks, CD-ROMs, and magnetic-opticaldisks, read-only memories (ROMs), random access memories (RAMs), EPROMs,EEPROMs, magnetic or optical cards, or any type of media suitable forstoring electronic instructions, each coupled to a computer system bus.

Finally, the algorithms and displays presented herein are not inherentlyrelated to any particular computer or other apparatus. Variousgeneral-purpose systems may be used with programs in accordance with theteachings herein, or it may prove convenient to construct morespecialized apparatus to perform the required method steps. The requiredstructure for a variety of these systems will be apparent from thedescription below. In addition, the techniques disclosed here are notdescribed with reference to any particular programming language. It willbe appreciated that a variety of programming languages may be used toimplement the techniques as described herein.

System Overview

FIG. 1A shows an embodiment of an MMR system 100 a. The MMR system 100 acomprises a plurality of mobile devices 102 a-102 n, a pre-processingserver 103, and an MMR matching unit 106. In an alternative embodiment,the pre-processing server 103 and its functionality are integrated intothe MMR matching unit 106. The techniques disclosed here provide an MMRsystem 100 a for image recognition combined with image tracking The MMRsystem 100 a is particularly advantageous because it provides fasterand/or more accurate search results resulting from the combined use ofMMR recognition and image tracking The MMR system 100 a is alsoadvantageous because its unique architecture can be easily adapted andupdated.

The mobile devices 102 a-102 n are communicatively coupled by signallines 132 a-132 n, respectively, to the pre-processing server 103 tosend a “retrieval request.” A retrieval request includes one or more of“image queries,” other contextual information, and metadata. In oneembodiment, an image query is an image in any format, or one or morefeatures of an image. Examples of image queries include still images,video frames and sequences of video frames. The mobile devices 102 a-102n are mobile computing devices such as mobile phones, which include acamera to capture images. It should be understood that the MMR system100 a will be utilized by hundreds or even millions of users. Thus, eventhough only two mobile devices 102 a, 102 n are shown, those skilled inthe art will appreciate that the pre-processing server 103 may besimultaneously coupled to, receive and respond to retrieval requestsfrom numerous mobile devices 102 a-102 n. Alternate embodiments for themobile devices 102 a-102 n are described in more detail below withreference to FIGS. 2A and 2B.

As noted above, the pre-processing server 103 is able to couple tohundreds if not millions of mobile computing devices 102 a-102 n andservice their retrieval requests. The pre-processing server 103 also maybe communicatively coupled to the computer 110 by signal line 130 foradministration and maintenance of the pre-processing server 103. Thecomputer 110 can be any conventional computing device such as a personalcomputer. The main function of the pre-processing server 103 isprocessing retrieval requests from the mobile devices 102 a-102 n andreturning recognition results back to the mobile devices 102 a-102 n. Inone embodiment, the recognition results include one or more of a Booleanvalue (true/false) and if true, a page ID, and a location on the page.In other embodiments, the recognition results also include one or morefrom the group of actions, a message acknowledging that the recognitionwas successful (or not) and consequences of that decision, such as thesending of an email message, a document, actions defined within aportable document file, addresses such as URLs, binary data such asvideo, information capable of being rendered on the mobile device 102,menus with additional actions, raster images, image features, etc. Thepre-processing server 103 generates an image query and recognitionparameters from the retrieval request according to one embodiment, andpasses them on to the MMR matching unit 106 via signal line 134. Thepre-processing server 103 also may perform some image trackingcomputation according to one embodiment. Embodiments and operation ofthe pre-processing server 103 are described in greater detail below withreference to FIG. 3A.

The MMR matching unit 106 receives the image query from thepre-processing server 103 on signal line 134 and sends it to one or moreof recognition units to identify a result including a document, the pageand the location on the page corresponding to the image query, referredto generally throughout this application as the “retrieval process.” Theresult is returned from the MMR matching unit 106 to the pre-processingserver 103 on signal line 134. In addition to the result, the MMRmatching unit 106 may also return other related information such ashotspot data. The MMR matching unit 106 also includes components forreceiving new content and updating and reorganizing index tables used inthe retrieval process. The process of adding new content to the MMRmatching unit 106 is referred to generally throughout this applicationas the “registration process.” Various embodiments of the MMR matchingunit 106 and is components are described in more detail below withreference to FIG. 4A-8.

FIG. 1B shows an embodiment of a MMR system 100 b. The MMR system 100 bcomprises a plurality of mobile devices 102 a-102 n, an MMR gateway 104,an MMR matching unit 106, an MMR publisher 108 and a computer 110. Thetechniques disclosed here provide, in one aspect, an MMR system 100 bfor use in newspaper publishing. The MMR system 100 b for newspaperpublishing is particularly advantageous because provides an automaticmechanism for a newspaper publisher to register images and content withthe MMR system 100 b. The MMR system 100 b for newspaper publishing isalso advantageous because it has a unique architecture adapted torespond to image queries formed of image portions or pages of a printednewspaper. The MMR system 100 b is also advantageous because it providesfaster and/or more accurate search results resulting from the combineduse of MMR recognition and image tracking, and its unique architecturecan be easily adapted and updated.

The mobile devices 102 a-102 n are similar to those described above,except that they are communicatively coupled by signal lines 132 a-132n, respectively, to the MMR gateway 104 to send a “retrieval request,”rather than to the pre-processing server 103. It should be understoodthat the MMR system 100 b will be utilized by hundreds or even millionsof users that receive a traditional publication such as a dailynewspaper.

As noted above, the MMR gateway 104 is able to couple to hundreds if notmillions of mobile computing devices 102 a-102 n and service theirretrieval requests. The MMR gateway 104 is also communicatively coupledto the computer 110 by signal line 130 for administration andmaintenance of the MMR gateway 104 and running business applications. Inone embodiment, the MMR gateway 104 creates and presents a web portalfor access by the computer 110 to run business applications as well asaccess logs of use of the MMR system 100 b. The computer 110 can be anyconventional computing device such as a personal computer. The mainfunction of the MMR gateway 104 is processing retrieval requests fromthe mobile devices 102 a-102 n and returning recognition results back tothe mobile devices 102 a-102 n. The types of recognition resultsproduced by the MMR gateway 104 are similar as to those described abovein conjunction with pre-processing server 103. The MMR gateway 104processes received retrieval requests by performing user authentication,accounting, analytics and other communication. The MMR gateway 104 alsogenerates an image query and recognition parameters from the retrievalrequest, and passes them on to the MMR matching unit 106 via signal line134. Embodiments and operation of the MMR gateway 104 are described ingreater detail below with reference to FIG. 3B.

The MMR matching unit 106 is similar to that described above inconjunction with FIG. 1A, except that the MMR matching unit 106 receivesthe image query from the MMR gateway 104 on signal line 134 as part ofthe “retrieval process.” The result is returned from the MMR matchingunit 106 to the MMR gateway 104 on signal line 134. In one embodiment,the MMR matching unit 106 is coupled to the output of the MMR publisher108 via signal lines 138 and 140 to provide new content used to updateindex tables of the MMR matching unit 106. In an alternate embodiment,the MMR publisher 108 is coupled to the MMR gateway 104 by signal line138 and the MMR gateway 104 is in turn coupled by signal line 136 to theMMR matching unit 106. In this alternate environment, MMR gateway 104extracts augmented data such as hotspot information stores it and passesthe images page references and other information to the MMR matchingunit 106 for updating of the index tables.

The MMR publisher 108 includes a conventional publishing system used togenerate newspapers or other types of periodicals. In one embodiment,the MMR publisher 108 also includes components for generating additionalinformation needed to register images of printed documents with the MMRsystem 100. The information provided by the MMR publisher 108 to the MMRmatching unit 106 includes an image file, bounding box data, hotspotdata, and a unique page identification number. In the symbols ofembodiment, this is a document in portable document format by AdobeCorp. of San Jose Calif. and bounding box information.

Mobile Device 102

Referring now to FIG. 2A and 2B, the first and second embodiments of themobile device 102 will be described.

FIG. 2A shows a first embodiment of the coupling 132 between the mobiledevice 102 and the pre-processing server 103 or MMR gateway 104,according to the above-described embodiments of system 100 a, 100 b. Inthe embodiment of FIG. 2A, the mobile device 102 is any mobile phone (orother portable computing device with communication capability) thatincludes a camera. For example, the mobile device 102 may be a smartphone such as the Blackberry® manufactured and sold by Research InMotion. The mobile device 102 is adapted for wireless communication withthe network 202 by a communication channel 230. The network 202 is aconventional type such as a cellular network maintained by wirelesscarrier and may include a server. In this embodiment, the mobile device102 captures an image and sends the image to the network 202 overcommunications channel 230 such as by using a multimedia messagingservice (MMS). The network 202 can also use the communication channel230 to return results such as using MMS or using a short message service(SMS). As illustrated, the network 202 is in turn coupled to thepre-processing server 103 or MMR gateway 104 by signal lines 232. Signallines 232 represent a channel for sending MMS or SMS messages as well asa channel for receiving hypertext transfer protocol (HTTP) requests andsending HTTP responses. Those skilled in the art will recognize thatthis is just one example of the coupling between the mobile device 102and the pre-processing server 103 or MMR gateway 104. In an alternateembodiment for example, Bluetooth®, WiFi, or any other wirelesscommunication protocol may be used as part of communication couplingbetween the mobile device 102 and the pre-processing server 103 or MMRgateway 104. The mobile device 102 and the pre-processing server 103 orMMR gateway 104 could be coupled in any other ways understood by thoseskilled in the art (e.g., direct data connection, SMS, WAP, email) solong as the mobile device 102 is able to transmit images to thepre-processing server 103 or MMR gateway 104 and the pre-processingserver 103 or MMR gateway 104 is able to respond by sending documentidentification, page number, and location information.

Referring now to FIG. 2B, a second embodiment of the mobile device 102is shown. In this second embodiment, the mobile device 102 is a smartphone such as the iPhone™ manufactured and sold by Apple Computer Inc.of Cupertino Calif. The second embodiment has a number of componentssimilar to those of the first embodiment, and therefore, like referencenumbers are used to reference like components with the same or similarfunctionality. Notable differences between the first embodiment and thesecond embodiment include a quality predictor plug-in 204 that isinstalled on the mobile device 102, an image tracker 240, and a Webserver 206 coupled by signal line 234 to the network 202. The qualitypredictor plug-in 204 analyzes the images captured by the mobile device102. The quality predictor plug-in 204 provides additional informationproduced by its analysis and includes that information as part of theretrieval request sent to the pre-processing server 103 or MMR gateway104 to improve the accuracy of recognition. In an alternate embodiment,the output of the quality predictor plug-in 204 is used to select whichimages are transmitted from the mobile device 102 to the pre-processingserver 103 or MMR gateway 104. For example, only those images that havea predicted quality above a predetermined threshold (e. g., imagescapable of being recognized) are transmitted from the mobile device 102to the pre-processing server 103 or MMR gateway 104. Since transmissionof images requires significant bandwidth and the communication channel230 between the mobile device 102 and the network 202 may have limitedbandwidth, using the quality predictor plug-in 204 to select whichimages to transmit is particularly advantageous.

The second embodiment shown in FIG. 2B also illustrates how the resultsreturned from the pre-processing server 103 or MMR gateway 104, or otherinformation provided by the quality predictor plug-in 204, can be usedby the mobile device 102 to access hotspot or augmented informationavailable on a web server 206. In such a case, the results from thepre-processing server 103 or MMR gateway 104 or output of the qualitypredictor plug-in 204 would include information that can be used toaccess Web server 206 such as with a conventional HTTP request and usingweb access capabilities of the mobile device 102.

The image tracker 240 is software and routines that allow recognitionand tracking of the look-at position and viewing region of the mobiledevice 102 based on received images, and may do so according to imagetracking methods known in the art, sometimes referred to as “motiontracking.” For example, for current motion estimation algorithms, seeKuhn, P., “Algorithm, Complexity Analysis and VLSI Architectures forMPEG-4 Motion Estimation,” Kluwer Academic Publishers, Norwell, Mass.(1999). The image tracker 240 may include software and routines fortracking camera motion as a projective transformation across videoframes, tracking camera motion with respect to the position of a paper,associating received images with recognized images, and correctingaccumulated drift. The image tracker 240 is further described inconjunction with FIG. 8.

FIG. 8 is a block diagram of an embodiment of an image tracker 240. Theimage tracker 240 receives images and provides image trackinginformation to the mobile device client 250. In this embodiment, if theimage tracker 240 identifies the received image to belong to the samepage as the last image submitted to the pre-processing server 103 or MMRgateway 104, the current image is not sent for recognition, and theposition returned for the tracker is used instead. This is advantageousas results are provided immediately to the user without the need totransmit the current image to the recognition server.

In another embodiment, the image tracker 240 receives images andprovides image tracking information to the pre-processing server 103 orMMR gateway 104 and MMR matching unit 106 for use during the imagerecognition process. The image tracker 240 also may combine images orsets of features from images, produce image tracking information, updateimage tracking information. The image tracker 240 may perform each ofthese functions as a whole or some or all of them may be performed by atracking manager 403, described in further detail in conjunction withFIGS. 4A-4B. When the tracking manager 403 performs at least some ofthese functions, the image tracker 240 also may provide informationregarding how images can be combined, e.g., relative timing of whenimages were received. The image tracker 240 includes a motion tracker802, a paper tracker 804, a recognition associator 806, and a driftcorrector 808 according to one embodiment.

The motion tracker 802 is software and routines for tracking cameramotion (e.g., for a camera on mobile device 102) as a projectivetransformation across video frames. The motion tracker 802 uses a firstvideo frame as a reference frame, and then outputs informationindicating movement of the camera. Thus, the motion tracker 802 providesinformation about the relative motion of the camera between frames. Thevideo tracker 802 provides image tracking information as output.

The paper tracker 804 is software and routines for tracking cameramotion with respect to the position of a paper document. The papertracker 804 uses the plane of the paper document as a reference frame,and outputs information indicating the position of the camera relativeto the paper document plane.

The recognition associator 806 is software and routines for associatingimage tracking information, e.g., as provided by motion tracker 802,with the recognition processes described herein. The recognitionassociator 806, e.g., updates tracking information to reflect anabsolute location of a received image on a page as determined by MMRrecognition.

The drift corrector 808 is software and routines for correctingaccumulated camera drift, e.g., according to the method of FIG. 13. Thedrift corrector 808 is in communication with the paper tracker 804 forto ensure that the document page, location, and viewing area areproperly aligned with the paper document and provide drift correctioninformation to the paper tracker 804.

It should be noted that regardless of whether the first embodiment orthe second embodiment of the mobile device 102 is used according toFIGS. 2A and 2B, the mobile device 102 generates a retrieval requestthat includes: a query image, a user or device ID, a command, and othercontact information such as device type, software, plug-ins, location(for example if the mobile device includes a GPS capability), device andstatus information (e.g., device model, macro lens on/off status,autofocus on/off, vibration on/off, tilt angle, etc), context-relatedinformation (weather at the phone's location, time, date, applicationscurrently running on the phone), user-related information (e.g., idnumber, preferences, user subscriptions, user groups and socialstructures, action and action-related meta data such as email actionsand emails waiting to be sent), etc.

Referring now to FIGS. 2C-2H, various embodiments are shown of a plug-in(client 250) for the mobile device 102, the pre-processing server 103 orMMR gateway 104 and MMR matching unit 106 represented generally asincluding a server 252 that has various possible configurations. Asshown in dotted notation in FIGS. 2C-2H, the image tracker 240 andtracking manager 403 described herein optionally may be included in thevarious configurations. More particularly, FIGS. 2C-2H illustrate howthe components of the plug-in or client 250 can have varying levels offunctionality and the server 252 can also have varying levels offunctionality that parallel or match with the functionality of theclient 250. In the various embodiments of FIGS. 2C-2H, either the client250 or the server 252 includes: an MMR database 254, a capture module260 for capturing an image or video, a preprocessing module 262 forprocessing the image before feature extraction for improved recognitionsuch as quality prediction; a feature extraction module 264 forextracting image features, a retrieval module 266 for using features toretrieve information from the MMR database 254, a send message module268 for sending messages from the server 252 to the client 250, anaction module 270 for performing an action, a preprocessing andprediction module 272 for processing the image prior to featureextraction, a feedback module 274 for presenting information to the userand receiving input, a sending module 276 for sending information fromthe client 250 to the server 252, and a streaming module 278 forstreaming video from the client 250 to the server 252.

FIG. 2C illustrates one embodiment for the client 250 and the server 252in which the client 250 sends an image or video and/or metadata to theserver 252 for processing. In this embodiment, the client 250 includesthe capture module 260. The server 252 includes: the MMR database 254,the preprocessing module 262, the feature extraction module 264, theretrieval module 266, the send message module 268 and the action module270. In this example, most of the image tracking functionality isperformed by the tracking manager 403 on the server 252; only theminimum information required from the image tracker 240, e.g., timinginformation for received images, is provided on the client 250.

FIG. 2D illustrates another embodiment for the client 250 and the server252 in which the client 250 captures an image or video, runs qualityprediction, and sends an image or video and/or metadata to the server252 for processing. In this embodiment, the client 250 includes: thecapture module 260, the preprocessing and prediction module 272, thefeedback module 274 and the sending module 276. The server 252 includes:the MMR database 254, the preprocessing module 262, the featureextraction module 264, the retrieval module 266, the send message module268 and the action module 270. It should be noted that in thisembodiment the image sent to the server 252 may be different than thecaptured image. For example, it may be digitally enhanced, sharpened, ormay be just binary data. In this example, the image trackingfunctionality is shared by the image tracker 240 and the trackingmanager 403.

FIG. 2E illustrates another embodiment for the client 250 and the server252 in which the client 250 captures an image or video, performs featureextraction, and sends image features to the server 252 for processing.In this embodiment, the client 250 includes: the capture module 260, thefeature extraction module 264, the preprocessing and prediction module272, the feedback module 274 and the sending module 276. The server 252includes: the MMR database 254, the retrieval module 266, the sendmessage module 268, and the action module 270. It should be noted thatin this embodiment feature extraction may include preprocessing. Afterfeatures are extracted, the preprocessing and prediction module 272 mayrun on these features and if the quality of the features is notsatisfactory, the user may be asked to capture another image. In thisexample, most of the image tracking functionality is provided by theimage tracker 240.

FIG. 2F illustrates another embodiment for the client 250 and the server252 in which the entire retrieval process is performed at the client250. In this embodiment, the client 250 includes: the capture module260, the feature extraction module 264, the preprocessing and predictionmodule 272, the feedback module 274, and the sending module 276, the MMRdatabase 254, and the retrieval module 266. The server 252 need onlyhave the action module 270. In this example, all of the image trackingfunctionality is provided by the image tracker 240.

FIG. 2G illustrates another embodiment for the client 250 and the server252 in which the client 250 streams video to the server 252. In thisembodiment, the client 250 includes the capture module 260 and astreaming module 278. The server 252 includes the MMR database 254, thepreprocessing module 262, the feature extraction module 264, theretrieval module 266, the send message module 268, and the action module270. Although not shown, the client 250 can run a predictor in thecaptured video stream and provide user feedback on where to point thecamera or how to capture better video for retrieval. In a modificationof this embodiment, the server 252 streams back information related tothe captured video and the client 250 can overlay that information on avideo preview screen. In this example, most of the image trackingfunctionality is provided by the tracking manager 403.

FIG. 2H illustrates another embodiment for the client 250 and the server252 in which the client 250 runs a recognizer and the server 252 streamsMMR database information to a local database operable with the client250 based upon a first recognition result. This embodiment is similar tothat described above with reference to FIG. 2F. For example, the entireretrieval process for one recognition algorithm is run at the client250. If the recognition algorithm fails, the query is handed to theserver 252 for running more complex retrieval algorithm. In thisembodiment, the client 250 includes: the capture module 260, the featureextraction module 264, the preprocessing and prediction module 272, thefeedback module 274, the sending module 276, the MMR database 254 (alocal version), and the retrieval module 266. The server 252 includesanother retrieval module 266, the action module 270, and the MMRdatabase 254 (a complete and more complex version). In one embodiment,if the query image cannot be recognized with the local MMR database 254,the client 250 sends an image for retrieval to the server 252 and thatinitiates an update of the local MMR database 254. Alternatively, theclient 250 may contain an updated version of a database for onerecognizer, but if the query image cannot be retrieved from the localMMR database 254, then a database for another retrieval algorithm may bestreamed to the local MMR database 254. In this example, all of theimage tracking functionality is provided by the image tracker 240.

Pre-processing Server 103

Referring now to FIG. 3A, one embodiment of the pre-processing server103 is shown. This embodiment of the pre-processing server 103 comprisesan operating system (OS) 301, a controller 303, a communicator 305, arequest processor 307, and applications 312, connected to system bus325. Optionally, the pre-processing server 103 also may include a webserver 304, a database 306, and/or a hotspot database 404.

As noted above, one of the primary functions of the pre-processingserver 103 is to communicate with many mobile devices 102 to receiveretrieval requests and send responses including a status indicator (true=recognized/false =not recognized), a page identification number, alocation on the page and other information, such as hotspot data. Asingle pre-processing server 103 can respond to hundreds or millions ofretrieval requests. For convenience and ease of understanding only asingle pre-processing server 103 is shown in FIGS. 1A and 3A, however,those skilled in the art will recognize that in other embodiments anynumber of pre-processing servers 103 may be utilized to service theneeds of a multitude of mobile devices 102. More particularly, thepre-processing server 103 system bus 325 is coupled to signal lines 132a-132 n for communication with various mobile devices 102. Thepre-processing server 103 receives retrieval requests from the mobiledevices 102 via signal lines 132 a-132 n and sends responses back to themobile devices 102 using the same signal lines 132 a-132 n. In oneembodiment, the retrieval request includes: a command, a useridentification number, an image, and other context information. Forexample, other context information may include: device information suchas the make, model or manufacture of the mobile device 102; locationinformation such as provided by a GPS system that is part of the mobiledevice or by triangulation; environmental information such as time ofday, temperature, weather conditions, lighting, shadows, objectinformation; and placement information such as distance, location, tilt,and jitter.

The pre-processing server 103 also is coupled to signal line 130 forcommunication with the computer 110. Again, for convenience and ease ofunderstanding only a single computer 110 and signal line 130 are shownin FIGS. 1A and 3A, but any number of computing devices may be adaptedfor communication with the pre-processing server 103. The pre-processingserver 103 facilitates communication between the computer 110 and theoperating system (OS) 301, a controller 303, a communicator 305, arequest processor 307, and applications 312. The OS 301, controller 303,communicator 305, request processor 307, and applications 312 arecoupled to system bus 325 by signal line 330.

The pre-processing server 103 processes the retrieval request andgenerates an image query and recognition parameters that are sent viasignal line 134, which also is coupled to system bus 325, and to the MMRmatching unit 106 for recognition. The pre-processing server 103 alsoreceives recognition responses from the MMR matching unit 106 via signalline 134. More specifically, the request processor 307 processes theretrieval request and sends information via signal line 330 to the othercomponents of the pre-processing server 103 as will be described below.

The operating system 301 is preferably a custom operating system that isaccessible to computer 110, and otherwise configured for use of thepre-processing server 103 in conjunction with the MMR matching unit 106.In an alternate embodiment, the operating system 301 is one of aconventional type such as, WINDOWS® , Mac OS X®, SOLARIS®, or LINUX®based operating systems. The operating system 301 is connected to systembus 325 via signal line 330.

The controller 303 is used to control the other modules 305, 307, 312,per the description of each below. While the controller 303 is shown asa separate module, those skilled in the art will recognize that thecontroller 303 in another embodiment may be distributed as routines inother modules. The controller 303 is connected to system bus 325 viasignal line 330.

The communicator 305 is software and routines for sending data andcommands among the pre-processing server 103, mobile devices 102, andMMR matching unit 106. The communicator 305 is coupled to signal line330 to send and receive communications via system bus 325. Thecommunicator 305 communicates with the request processor 307 to issueimage queries and receive results.

The request processor 307 processes the retrieval request received viasignal line 330, performing preprocessing and issuing image queries forsending to MMR matching unit 106 via signal line 134. In variousembodiments, the preprocessing may include feature extraction andrecognition parameter definition. The request processor 307 also sendsinformation via signal line 330 to the other components of thepre-processing server 103. The request processor 307 is connected tosystem bus 325 via signal line 330.

The one or more applications 312 are software and routines for providingfunctionality related to the processing of MMR documents. Theapplications 312 can be any of a variety of types, including withoutlimitation, drawing applications, word processing applications,electronic mail applications, search application, financialapplications, and business applications adapted to utilize informationrelated to the processing of retrieval quests and delivery ofrecognition responses such as but not limited to accounting, groupware,customer relationship management, human resources, outsourcing, loanorigination, customer care, service relationships, etc. In addition,applications 312 may be used to allow for annotation, linking additionalinformation, audio or video clips, building e-communities or socialnetworks around the documents, and associating educational multimediawith recognized documents.

System bus 325 represents a shared bus for communicating information anddata throughout pre-processing server 103. System bus 325 may representone or more buses including an industry standard architecture (ISA) bus,a peripheral component interconnect (PCI) bus, a universal serial bus(USB), or some other bus known in the art to provide similarfunctionality. Additional components may be coupled to pre-processingserver 103 through system bus 325 according to various embodiments.

The pre-processing server 103 optionally also includes a web server 304,a database 306, and/or a hotspot database 404 according to variousembodiments.

The web server 304 is a conventional type and is responsible foraccepting HTTP requests from web clients and sending responses alongwith data contents, such as web pages, documents, and linked objects(images, etc.) The Web server 304 is coupled to data store 306 such as aconventional database. The Web server 304 is adapted for communicationvia signal line 234 to receive HTTP requests from any communicationdevice, e.g., mobile devices 102, across a network such as the Internet.The Web server 304 also is coupled to signal line 330 as described aboveto receive Web content associated with hotspots for storage in the datastore 306 and then for later retrieval and transmission in response toHTTP requests. Those skilled in the art will understand that inclusionof the Web server 304 and data store 306 as part of the pre-processingserver 103 is merely one embodiment and that the Web server 304 and thedata store 306 may be operational in any number of alternate locationsor configuration so long as the Web server 304 is accessible to mobiledevices 102 and computers 110 via the Internet.

In one embodiment, the pre-processing server 103 also includes a hotspotdatabase 404. The hotspot database 404 is shown in FIG. 3A with dashedlines to reflect that inclusion in the pre-processing server 103 is analternate embodiment. The hotspot database 404 is coupled by signal line436 to receive the recognition responses via line 134. The hotspotdatabase 404 uses these recognition responses to query the database andoutput via line 432 and system bus 325 the hotspot content correspondingto the recognition responses. This hotspot content is included with therecognition responses sent to the requesting mobile device 102.

MMR Gateway 104

Referring now to FIG. 3B, one embodiment of the MMR gateway 104 isshown. This embodiment of the MMR gateway 104 comprises a server 302, aWeb server 304, a data store 306, a portal module 308, a log 310, one ormore applications 312, an authentication module 314, an accountingmodule 316, a mail module 318, and an analytics module 320.

As noted above, one of the primary functions of the MMR gateway 104 isto communicate with many mobile devices 102 to receive retrievalrequests and send responses including a status indicator(true=recognized/false=not recognized), a page identification number, alocation on the page and other information such as hotspot data. Asingle MMR gateway 104 can respond to hundreds or millions of retrievalrequests. For convenience and ease of understanding only a single MMRgateway 104 is shown in FIGS. 1B and 3B, however, those skilled in theart will recognize that in other embodiments any number of MMR gateways104 may be utilized to service the needs of a multitude of mobiledevices 102. More particularly, the server 302 of the MMR gateway 104 iscoupled to signal lines 132 a-132 n for communication with variousmobile devices 102. The server 302 receives retrieval requests from themobile devices 102 via signal lines 132 a-132 n and sends responses backto the mobile devices 102 using the same signal lines 132 a-132 n. Inone embodiment, the retrieval request includes: a command, a useridentification number, an image and other context information. Forexample, other context information may include: device information suchas the make, model or manufacture of the mobile device 102; locationinformation such as provided by a GPS system that is part of the mobiledevice or by triangulation; environmental information such as time ofday, temperature, weather conditions, lighting, shadows, objectinformation; and placement information such as distance, location, tilt,and jitter.

The server 302 is also coupled to signal line 130 for communication withthe computer 110. Again, for convenience and ease of understanding onlya single computer 110 and signal line 130 are shown in FIGS. 1B and 3B,but any number of computing devices may be adapted for communicationwith the server 302. The server 302 facilitates communication betweenthe computer 110 and the portal module 308, the log module 310 and theapplications 312. The server 302 is coupled to the portal module 308,the log module 310 and the applications 312 by signal line 330. As willbe described in more detail below, the module cooperate with the server302 to present a web portal that provides a user experience forexchanging information. The Web portal 308 can also be used for systemmonitoring, maintenance and administration.

The server 302 processes the retrieval request and generates an imagequery and recognition parameters that are sent via signal line 134 tothe MMR matching unit 106 for recognition. The server 302 also receivesrecognition responses from the MMR matching unit 106 via 5 signal line134. The server 302 also processes the retrieval request and sendsinformation via signal line 330 to the other components of the MMRgateway 104 as will be described below. The server 302 is also adaptedfor communication with the MMR publisher 108 by signal line 138 and theMMR matching unit 106 via signal line 136. The signal line 138 providesa path for the MMR publisher 108 to send Web content for hotspots to theWeb server 304 and to provide other information to the server 302. Inone embodiment, the server 302 receives information from the MMRpublisher 108 and sends that information via signal line 136 forregistration with the MMR matching unit 106.

The web server 304 is a conventional type and is responsible foraccepting requests from clients and sending responses along with datacontents, such as web pages, documents, and linked objects (images,etc.) The Web server 304 is coupled to data store 306 such as aconventional database. The Web server 304 is adapted for communicationvia signal line 234 to receive HTTP requests from any communicationdevice across a network such as the Internet. The Web server 304 is alsocoupled to signal line 138 as described above to receive Web contentassociated with hotspots for storage in the data store 306 and then forlater retrieval and transmission in response to HTTP requests. Thoseskilled in the art will understand that inclusion of the Web server 304and data store 306 as part of the MMR gateway 104 is merely oneembodiment and that the Web server 304 and the data store 306 may beoperational in any number of alternate locations or configuration solong as the Web server 304 is accessible to mobile devices 102 andcomputers 110 via the Internet.

In one embodiment, the portal module 308 is software or routinesoperational on the server 302 for creation and presentation of the Webportal. The portal module 308 is coupled to signal line 330 forcommunication with the server 302. In one embodiment, the web portalprovides an access point for functionality including administration andmaintenance of other components of the MMR gateway 104. In anotherembodiment, the web portal provides an area where users can shareexperiences related to MMR documents. In yet another embodiment, the webportal but an area where users can access business applications and thelog 310 of usage.

The log 310 is a memory or storage area for storing a list of theretrieval request received by the server 302 from mobile devices 102 andall corresponding responses sent by the server 302 to the mobile device.In another embodiment, the log 310 also stores a list of the imagequeries generated and sent to the MMR matching unit 106 and therecognition responses received from the MMR matching unit 106. The log310 is coupled to signal line 330 for access by the server 302.

The one or more business applications 312 are software and routines forproviding functionality related to the processing of MMR documents. Inone embodiment the one or more business applications 312 are executableon the server 302. The business applications 312 can be any one of avariety of types of business applications adapted to utilize informationrelated to the processing of retrieval quests and delivery ofrecognition responses such as but not limited to accounting, groupware,customer relationship management, human resources, outsourcing, loanorigination, customer care, service relationships, etc.

The authentication module 314 is software and routines for maintaining alist of authorized users and granting access to the MMR system 110. Inone embodiment, the authentication module 314 maintains a list of userIDs and passwords corresponding to individuals who have created anaccount in the system 100 b, and therefore, are authorized to use MMRgateway 104 and the MMR matching unit 106 to process retrieval requests.The authentication module 314 is communicatively coupled by signal line330 to the server 302. But as the server 302 receives retrieval requeststhey can be processed and compared against information in theauthentication module 314 before generating and sending thecorresponding image query on signal line 134. In one embodiment, theauthentication module 314 also generates messages for the server 302 toreturn to the mobile device 102 instances when the mobile device is notauthorized, the mobile device has not established an account, or theaccount for the mobile device 102 is locked such as due to abuse or lackof payment.

The accounting module 316 is software and routines for performingaccounting related to user accounts and use of the MMR system 100 b. Inone embodiment, the retrieval services are provided under a variety ofdifferent economic models such as but not limited to use of the MMRsystem 100 b under a subscription model, a charge per retrieval requestmodel or various other pricing models. In one embodiment, the MMR system100 b provides a variety of different pricing models and is similar tothose currently offered for cell phones and data networks. Theaccounting module 316 is coupled to the server 302 by signal line 330 toreceive an indication of any retrieval request received by the server302. In one embodiment, the accounting module 316 maintains a record oftransactions (retrieval request/recognition responses) processed by theserver 302 for each mobile device 102. Although not shown, theaccounting module 316 can be coupled to a traditional billing system forthe generation of an electronic or paper bill.

The mail module 318 is software and routines for generating e-mail andother types of communication. The mail module 318 is coupled by signalat 330 to the server 302. In one embodiment, the mobile device 102 canissue retrieval requests that include a command to deliver a document ora portion of a document or other information via e-mail, facsimile orother traditional electronic communication means. The mail module 318 isadapted to generate and send such information from the MMR gateway 104to an addressee as prescribed by the user. In one embodiment, each userprofile has associated addressees which are potential recipients ofinformation retrieved.

The analytics module 320 is software and routines for measuring thebehavior of users of the MMR system 100 b. The analytics module 320 isalso software and routines for measuring the effectiveness and accuracyof feature extractors and recognition performed by the MMR matching unit106. The analytics module 320 measures use of the MMR system 100 bincluding which images are most frequently included as part of retrievalrequests, which hotspot data is most often accessed, the order in whichimages are retrieved, the first image in the retrieval process, andother key performance indicators used to improve the MMR experienceand/or a marketing campaign's audience response. In one embodiment, theanalytics module 302 measures metrics of the MMR system 100 b andanalyzes the metrics used to measure the effectiveness of hotspots andhotspot data. The analytics module 320 is coupled to the server 302, theauthentication module 314 and the accounting module 316 by signal line330. The analytics module 320 is also coupled by the server 302 tosignal line 134 and thus can access the components of the MMR matchingunit 106 to retrieve recognition parameter, images features, qualityrecognition scores and any other information generated or use by the MMRmatching unit 106. The analytics module 320 can also perform a varietyof data retrieval and segmentation based upon parameters or criteria ofusers, mobile devices 102, page IDs, locations, etc.

In one embodiment, the MMR gateway 104 also includes a hotspot database404. The hotspot database 404 is shown in FIG. 3 with dashed lines toreflect that inclusion in the MMR gateway 104 is an alternateembodiment. The hotspot database 404 is coupled by signal line 436 toreceive the recognition responses via line 134. The hotspot database 404uses these recognition responses to query the database and output vialine 432 the hotspot content corresponding to the recognition responses.This hotspot content is sent to the server 302 so that it can beincluded with the recognition responses and sent to the requestingmobile device 102.

MMR Matching Unit 106

Referring now to FIGS. 4A and 4B, two embodiments for the MMR matchingunit 106 will be described. The basic function of the MMR matching unit106 is to receive an image query, send the image query for recognition,perform recognition on the images in the image query, retrieve hotspotinformation, combine the recognition result with hotspot information,and send it back to the pre-processing server 103 or MMR gateway 104.

FIG. 4A illustrates a first embodiment of the MMR matching unit 106. Thefirst embodiment of the MMR matching unit 106 comprises a dispatcher402, a tracking manager 403, a hotspot database 404, an acquisition unit406, an image registration unit 408, and a dynamic load balancer 418.The acquisition unit 406 further comprises a plurality of therecognition units 410 a-410 n and a plurality of index tables 412 a-412n. The image registration unit 408 further comprises an indexing unit414 and a master index table 416.

The dispatcher 402 is coupled to signal line 134 for receiving an imagequery from and sending recognition results to the pre-processing server103 or MMR gateway 104. The dispatcher 402 is responsible for assigningand sending an image query to respective recognition units 410 a-41 On.In one embodiment, the dispatcher 402 receives an image query, generatesa recognition unit identification number, and sends the recognition unitidentification number and the image query to the acquisition unit 406for further processing. The dispatcher 402 is coupled to signal line 430to send the recognition unit identification number and the image queryto the recognition units 410 a-410 n. The dispatcher 402 also receivesthe recognition results from the acquisition unit 406 via signal line430. One embodiment for the dispatcher 402 will be described in moredetail below with reference to FIG. 5.

The tracking manager 403 provides image tracking information to thepre-processing server 103 or MMR gateway 104 and MMR matching unit 106for use during the image recognition process. The tracking manager 403also may combine images or sets of features from images, produce imagetracking information, and update image tracking information. Thetracking manager 403 may perform each of these functions as a whole orsome or all of them may be performed by an image tracker 240, asdescribed in conjunction with FIG. 2B. Possible distributions of thesefunctions between the image tracker 240 and the tracking manager 403 aredescribed in conjunction with FIGS. 2C-2H. When the tracking manager 403performs at least some of these functions, the image tracker 240 alsomay provide information regarding how images can be combined, e.g.,relative timing of when images were received. In addition, the trackingmanager 403 provides some post-recognition tracking-related processing.For example, the tracking manager 403 may verify the trackinginformation by comparing received recognition results for image queriesfor which tracking information is known. In addition, the trackingmanager 403 may receive a recognition result that requires editing insome way, e.g., cropped to correspond to scanned portions of a documentpage.

An alternate embodiment for the hotspot database 404 has been describedabove with reference to FIGS. 3A-3B wherein the hotspot database is partof the pre-processing server 103 or MMR gateway 104. However, thepreferred embodiment for the hotspot database 404 is part of the MMRmatching unit 106 as shown in FIG. 4A. Regardless of the embodiment, thehotspot database 404 has a similar functionality. The hotspot database404 is used to store hotspot information. Once an image query has beenrecognized and recognition results are produced, these recognitionresults are used as part of a query of the hotspot database 404 toretrieve hotspot information associated with the recognition results.The retrieved hotspot information is then output on signal line 134 tothe pre-processing server 103 or MMR gateway 104 for packaging anddelivery to the mobile device 102. As shown in FIG. 4A, the hotspotdatabase 404 is coupled to the dispatcher 402 by signal line 436 toreceive queries including recognition results. The hotspot database 404is also coupled by signal line 432 and signal line 134 to thepre-processing server 103 or MMR gateway 104 for delivery of queryresults. The hotspot database 404 is also coupled to signal line 136 toreceive new hotspot information for storage from the MMR publisher 108,according to one embodiment.

The acquisition unit 406 comprises the plurality of the recognitionunits 410 a-410 n and a plurality of index tables 412 a-412 n. Each ofthe recognition units 410 a-410 n has and is coupled to a correspondingindex table 412 a-412 n. In one embodiment, each recognition unit410/index table 412 pair is on the same server. The dispatcher 402 sendsthe image query to one or more recognition units 410 a-410 n. In oneembodiment that includes redundancy, the image query is sent from thedispatcher 402 to a plurality of recognition units 410 for recognitionand retrieval and the index tables 412 a-n index the same data. In theserial embodiment, the image query is sent from the dispatcher 402 to afirst recognition unit 410 a. If recognition is not successful on thefirst recognition unit 410 a, the image query is passed on to a secondrecognition unit 410 b, and so on. In yet another embodiment, thedispatcher 402 performs some preliminary analysis of the image query andthen selects a recognition unit 410 a-410 n best adapted and most likelyto be successful at recognizing the image query. Those skilled in theart will understand that there are a variety of configurations for theplurality of recognition units 410 a-410 n and the plurality of indextables 412 a-412 n. Example embodiments for the acquisition unit 406will be described in more detail below with reference to FIGS. 6A-6B. Itshould be understood that the index tables 412 a-412 n can be updated atvarious times as depicted by the dashed lines 434 from the master indextable 416.

The image registration unit 408 comprises the indexing unit 414 and themaster index table 416. The image registration unit 408 has an inputcoupled to signal on 136 to receive updated information from the MMRpublisher 108, according to one embodiment, and an input coupled tosignal line 438 to receive updated information from the dynamic loadbalancer 418. The image registration unit 408 is responsible formaintaining the master index table 416 and migrating all or portions ofthe master index table 416 to the index tables 412 a-412 n (slavetables) of the acquisition unit 406. In one embodiment, the indexingunit 414 receives images, unique page IDs, and other information; andconverts it into index table information that is stored in the masterindex table 416. In one embodiment, the master index table 416 alsostores the record of what is migrated to the index table 412. Theindexing unit 414 also cooperates with the MMR publisher 108 accordingto one embodiment to maintain a unique page identification numberingsystem that is consistent across image pages generated by the MMRpublisher 108, the image pages stored in the master index table 416, andthe page numbers used in referencing data in the hotspot database 404.

One embodiment for the image registration unit 408 is shown anddescribed in more detail below with reference to FIG. 7.

The dynamic load balancer 418 has an input coupled to signal line 430 toreceive the query image from the dispatcher 402 and the correspondingrecognition results from the acquisition unit 406. The output of thedynamic load balancer 418 is coupled by signal line 438 to an input ofthe image registration unit 408. The dynamic load balancer 418 providesinput to the image registration unit 408 that is used to dynamicallyadjust the index tables 412 a-412 n of the acquisition unit 406. Inparticular, the dynamic load balancer 418 monitors and evaluates theimage queries that are sent from the dispatcher 402 to the acquisitionunit 406 for a given period of time. Based on the usage, the dynamicload balancer 418 provides input to adjust the index tables 412 a-412 n.For example, the dynamic load balancer 418 may measure the image queriesfor a day. Based on the measured usage for that day, the index tablesmay be modified and configured in the acquisition unit 406 to match theusage measured by the dynamic load balancer 418.

FIG. 4B illustrates a second embodiment of the MMR matching unit 106. Inthe second embodiment, many of the components of the MMR matching unit106 have the same or a similar function to corresponding elements of thefirst embodiment. Thus, like reference numbers have been used to referto like components with the same or similar functionality. The secondembodiment of the MMR matching unit 106 includes the dispatcher 402, thehotspot database 404, and the dynamic load balancer 418 similar to thefirst embodiment of the MMR matching unit 106. However, the acquisitionunit 406 and the image registration unit 408 are different than thatdescribed above with reference to FIG. 4A. In particular, theacquisition unit 406 and the image registration unit 408 utilize ashared SQL database for the index tables and the master table. Morespecifically, there is the master index table 416 and a mirroreddatabase 418 that includes the local index tables 412 a-n. Moreover, aconventional functionality of SQL database replication is used togenerate the mirror images of the master index table 416 stored in theindex tables 412 a-412 n for use in recognition. The image registrationunit 408 is configured so that when new images are added to the masterindex table 416 they are immediately available to all the recognitionunits 410. This is done by mirroring the master index table 416 acrossall the local index tables 412 a-n using large RAM (not shown) anddatabase mirroring technology.

Dispatcher 402

Referring now to FIG. 5, an embodiment of the dispatcher 402 shown. Thedispatcher 402 comprises a quality predictor 502, an image feature orderunit 504, and a distributor 506. The quality predictor 502, the imagefeature order unit 504, and the distributor 506 are coupled to signalline 532 to receive image queries from the pre-processing server 103 orMMR gateway 104.

The quality predictor 502 receives image queries and generates arecognizability score used by the dispatcher 402 to route the imagequery to one of the plurality of recognition units 410. The dispatcher402 also receives recognition results from the recognition units 410 onsignal line 530. The recognition results include a Boolean value(true/false) and if true, a page ID, and a location on the page. In oneembodiment, the dispatcher 402 merely receives and retransmits the datato the pre-processing server 103 or MMR gateway 104.

One embodiment of the quality predictor 502 comprises recognitionalgorithm parameters 552, a vector calculator 554, a score generator 556and a scoring module 558. The quality predictor 502 has inputs coupledto signal line 532 to receive an image query, context and metadata, anddevice parameters. The image query may be video frames, a single frame,or image features. The context and metadata includes time, date,location, environmental conditions, etc. The device parameters includebrand, type, macro block on/off, gyro or accelerometer reading,aperture, time, exposure, flash, etc. Additionally, the qualitypredictor 502 uses certain parameters of the recognition algorithmparameters 552. These recognition algorithm parameters 552 can beprovided to the quality predictor 502 from the acquisition unit 406 orthe image registration unit 408. The vector calculator 554 computesquality feature vectors from the image to measure its content anddistortion, such as its blurriness, existence and amount of recognizablefeatures, luminosity, etc. The vector calculator 554 computes any numberof quality feature vectors from one to n. In some cases, the vectorcalculator 554 requires knowledge of the recognition algorithm(s) to beused, and the vector calculator 554 is coupled by signal line 560 to therecognition algorithm parameters 552. For example, if an InvisibleJunctions algorithm is employed, the vector calculator 554 computes howmany junction points present in the image as a measure of itsrecognizability. All or some of these computed features are then inputto score generator 556 via signal line 564. The score generator 556 isalso coupled by signal line 562 to receive the recognition algorithmparameters 552. The output of the score generator 556 is provided to thescoring module 558. The scoring module 558 generates a recognition scoreusing the recognition scores provided by the score generator 556 andapplies weights to those scores. In one embodiment, the result is asingle recognizability score. In another embodiment, the result is aplurality of recognizability scores ranked from highest to lowest.

The image feature order unit 504 receives image queries and outputs anordering signal. The image feature order unit 504 analyzes an inputimage query and predicts the time required to recognize an image byanalyzing the image features it contains. The difference between theactual recognition time and the predicted time is used to adjust futurepredictions thereby improving accuracy. In the simplest of embodiments,simple images with few features are assigned to lightly loadedrecognition units 410 so that they will be recognize quickly and theuser will see the answer immediately. In one embodiment, the featuresused by the image order feature unit 504 to predict the time aredifferent than the features used by recognition units 410 for actualrecognition. For example, the number of corners detected in an image isused to predict the time required to analyze the image. The feature setused for prediction need only be correlated with the actual recognitiontime. In one embodiment, several different features sets are used andthe correlations to recognition time measured over some period.Eventually, the feature set that is the best predictor and lowest cost(most efficient) would be determined and the other feature sets could bediscarded.

The distributor 506 is also coupled to receive the output of the qualitypredictor 502 and image feature order unit 504. The distributor 506includes a FIFO queue 508 and a controller 510. The distributor 506generates an output on signal line 534 that includes the image query anda recognition unit identification number (RUID). Those skilled in theart will understand that in other embodiments the image query may bedirected to any particular recognition unit using a variety of meansother than the RUID. As image queries are received on the signal line532, the distributor 506 receives the image queries and places them inthe order in which they are received into the FIFO queue 508. Thecontroller 510 receives a recognizability score for each image queryfrom the quality predictor 502 and also receives an ordering signal fromthe image feature order unit 504. Using this information from thequality predictor 502 and the image feature order unit 504, thecontroller 510 selects image queries from the FIFO queue 508, assignsthem to particular recognition units 410 and sends the image query tothe assigned recognition unit 410 for processing. The controller 510maintains a list of image queries assigned to each recognition unit 410and the expected time to completion for each image (as predicted by theimage feature order unit 504). The total expected time to empty thequeue for each recognition unit 410 is the sum of the expected times forthe images assigned to it. The controller 510 can execute several queuemanagement strategies. In a simple assignment strategy, image queriesare removed from the FIFO queue 508 in the order they arrived andassigned to the first available recognition unit 410. In a balancedresponse strategy, the total expected response time to each query ismaintained at a uniform level and query images are removed from the FIFOqueue 508 in the order they arrived, and assigned to the FIFO queue 508for a recognition unit so that its total expected response time is asclose as possible to the other recognition units. In an easy-firststrategy, images are removed from the FIFO queue 508 in an orderdetermined by their expected completion times—images with the smallestexpected completion times are assigned to the first availablerecognition unit. In this way, users are rewarded with faster responsetime when they submit an image that's easy to recognize. This couldincentivize users to carefully select the images they submit. Otherqueue management strategies are possible.

Acquisition Unit 406

Referring now to FIGS. 6A and 6B, embodiments of the acquisition unit406 will be described.

FIG. 6A illustrates one embodiment for the acquisition unit 406 wherethe recognition unit 410 and index table 412 pairs are partitioned basedon the content or images that they index. This configuration isparticularly advantageous for mass media publishers that provide contenton a periodic basis. The organization of the content in the index tables412 can be partitioned such that the content most likely to be accessedwill be available on the greatest number of recognition unit 410 andindex table 412 pairs. Those skilled in the art will recognize that thepartition described below is merely one example and that various otherpartitions based on actual usage statistics measured over time can beemployed. As shown in FIG. 6A, the acquisition unit 406 comprises aplurality of recognition units 410 a-h and a plurality of index tables412 a-h. The plurality of recognition units 410 a-h is coupled to signalline 430 to receive image queries from the dispatcher 402. Each of theplurality of recognition units 410 a-h is coupled to a correspondingindex table 412 a-h. The recognition units 410 extract features from theimage query and compare those image features to the features stored inthe index table to identify a matching page and location on that page.

Example recognition and retrieval systems and methods are disclosed inU.S. patent application Ser. No. 11/461,017, titled “System And MethodsFor Creation And Use Of A Mixed Media Environment,” filed Jul. 31, 2006;U.S. patent application Ser. No. 11/461,279, titled “Method And SystemFor Image Matching In A Mixed Media Environment,” filed Jul. 31, 2006;U.S. patent application Ser. No. 11/461,286, titled “Method And SystemFor Document Fingerprinting Matching In A Mixed Media Environment,”filed Jul. 31, 2006; U.S. patent application Ser. No. 11/461,294, titled“Method And System For Position-Based Image Matching In A Mixed MediaEnvironment,” filed Jul. 31, 2006; U.S. patent application Ser. No.11/461,300, titled “Method And System For Multi-Tier Image Matching In AMixed Media Environment,” filed Jul. 31, 2006; U.S. patent applicationSer. No. 11/461,147, titled “Data Organization and Access for MixedMedia Document System,” filed Jul. 31, 2006; U.S. patent applicationSer. No. 11/461,164, titled “Database for Mixed Media Document System,”filed Jul. 31, 2006; U.S. patent application Ser. No. 11/461,109, titled“Searching Media Content For Objects Specified Using Identifiers,” filedJul. 31, 2006; U.S. patent application Ser. No. 12/059,583, titled“Invisible Junction Feature Recognition For Document Security OrAnnotation,” filed Mar. 31, 2008; U.S. patent application Ser. No.12/121,275, titled “Web-Based Content Detection In Images, ExtractionAnd Recognition,” filed May 15, 2008; U.S. patent application Ser. No.11/776,510, titled “Invisible Junction Features For Patch Recognition,”filed Jul. 11, 2007; U.S. patent application Ser. No. 11/776,520, titled“Information Retrieval Using Invisible Junctions and GeometricConstraints,” filed Jul. 11, 2007; U.S. patent application Ser. No.11/776,530, titled “Recognition And Tracking Using Invisible Junctions,”filed Jul. 11, 2007; and U.S. patent application Ser. No. 11/777,142,titled “Retrieving Documents By Converting Them to Synthetic Text,”filed Jul. 12, 2007; and U.S. patent application Ser. No. 11/624,466 ,titled “Synthetic Image and Video Generation From Ground Truth Data,”filed Jan. 18, 2007; which are incorporated by reference in theirentirety.

As shown in FIG. 6A, the recognition unit 410/index table 412 pairs arehierarchical, and grouped according to the content that in the indextables 412. In particular, the first group 602 of recognition units 410a-d and index tables 412 a-d is used to index the pages of a publicationsuch as a newspaper for a current day according to one embodiment. Forexample, four of the eight recognition units 410 are used to indexcontent from the current day's newspaper because most of the retrievalrequests are likely to be related to the newspaper that was published inthe last 24 hours. A second group 604 of recognition units 410 e-g andcorresponding index tables 412 e-g are used to store pages of thenewspaper from recent past days, for example the past week. A thirdgroup 606 of recognition unit 410 h and index table 412 h is used tostore pages of the newspaper from older past days, for example for thepast year. This allows the organizational structure of the acquisitionunit 406 to be optimized to match the profile of retrieval requestsreceived. Moreover, the operation of the acquisition unit 406 can bemodified such that a given image query is first sent to the first group602 for recognition, and if the first group 602 is unable to recognizethe image query, it is sent to the second group 604 for recognition andso on.

It should be noted that the use of four recognition units 410 and indextables 412 as the first group 602 is merely by way of example and usedto demonstrate a relative proportion as compared with the number ofrecognition units 410 and index tables 412 in the second group 604 andthe third group 606. The number of recognition units 410 and indextables 412 in any particular group 602, 604 and 606 may be modifiedbased on the total number of recognition units 410 and index tables 412.Furthermore, the number of recognition units 410 and index tables 412 inany particular group 602, 604, and 606 may be adapted so that it matchesthe profile of all users sending retrieval request to the acquisitionunit 406 for a given publication.

Alternatively, the recognition unit 410 and index tables 412 pairs maybe partitioned such that there is overlap in the documents they index,e.g., such as segments of a single image according to content type. Inthis example, image queries are sent to index tables 412 in parallelrather than serially.

FIG. 6B illustrates a second embodiment for the acquisition unit 406where the recognition units 410 and index tables 412 are partitionedbased upon the type of recognition algorithm they implement. In thesecond embodiment, the recognition units 410 are also coupled such thatthe failure of a particular recognition unit to generate a registrationresult causes the input image query to be sent to another recognitionunit for processing. Furthermore, in the second embodiment, the indextables 412 include feature sets that are varied according to differentdevice and environmental factors of image capture devices (e.g., blur,etc.).

The second embodiment of the acquisition unit 406 includes a pluralityof recognition units 410 a-410 e, a plurality of the index tables 412a-412 e and a result combiner 610. In this embodiment, the recognitionunits 410 a-410 e each utilizes a different type of recognitionalgorithm. For example, recognition units 410 a, 410 b, and 410 c use afirst recognition algorithm; recognition unit 410 d uses a secondrecognition algorithm; and recognition unit 410 e uses a thirdrecognition algorithm for recognition and retrieval of page numbers andlocations. Recognition units 410 a, 410 d, and 410 e each have an inputcoupled signal line 430 by signal line 630 for receiving the imagequery. The recognition results from each of the plurality of recognitionunits 410 a-410 e are sent via signal lines 636, 638, 640, 642, and 644to the result combiner 610. The output of the result combiner 610 iscoupled to signal line 430.

In one embodiment, the recognition units 410 a, 410 b, and 410 ccooperate together with index tables 1, 2, and 3, 412 a-412 c eachstoring image features corresponding to the same pages but with variousmodifications, e.g., due to different device and environmental factors.For example, index table 1 412 a may store images features for pristineimages of pages such as from a PDF document, while index table 2 412 bstores images of the same pages but with a first level of modification,and index table 3 412 c stores images of the same pages but with asecond level of modification. In one embodiment, the index tables 1, 2,and 3, 412 a-412 c are quantization trees. The first recognition unit410 a receives the image query via signal line 630. The firstrecognition unit 410 a comprises a first type of feature extractor 602and a retriever 604 a. The first type of feature extractor 602 receivesthe image query, extracts the Type 1 features, and provides them to theretriever 604 a. The retriever 604 a uses the extracted Type 1 featuresand compares them to the index table 1 412 a. If the retriever 604 aidentifies a match, the retriever 604 a sends the recognition resultsvia signal line 636 to the result combiner 610. If however, theretriever 604 a was unable to identify a match or identifies a matchwith low confidence, the retriever 604 a sends the extracted Type 1features to the retriever 604 b of the second recognition unit 410 b viasignal line 632. It should be noted that since the Type 1 featuresalready have been extracted, the second recognition unit 410 b does notrequire a feature extractor 602. The second recognition unit 410 bperforms retrieval functions similar to the first recognition unit 410a, but cooperates with index table 2 412 b that has Type 1 features forslightly modified images. If the retriever 604 b identifies a match, theretriever 604 b sends the recognition results via signal line 638 to theresult combiner 610. If the retriever 604 b of the second recognitionunit 410 b is unable to identify a match or identifies a match with lowconfidence, the retriever 604 b sends the extracted features to theretriever 604 c of the third recognition unit 410 b via modification areprovided, this is only by way of example and that any number ofadditional levels of modification from 0 to n may be used.

The recognition units 410 d and 410 e operate in parallel with the otherrecognition units 410 a-c. The fourth recognition unit 410 d comprises asecond type of feature extractor 606 and a retriever 604 d. The Type 2feature extractor 606 received the image query and bounding boxes orother feature identifiers, parses the bounding boxes or other featureidentifiers, and generates Type 2 coding features. These Type 2 featuresare provided to the retriever 604 d and the retriever 604 d comparesthem to the features stored in index table 4 412 d. In one embodiment,index table 4 412 d is a hash table. The retriever 604 d identifies anymatching pages and returns the recognition results to the resultcombiner 610 via signal line 642. The fifth recognition unit 410 eoperates in a similar manner but for a third type of feature extraction.The fifth recognition unit 410 e comprises a Type 3 feature extractor608 and a retriever 604 e. The Type 3 feature extractor 608 receives theimage query and bounding boxes or other feature identifiers, parses theimage and generates Type 3 features and the features that are providedto the retriever 604 e and the retriever 604 e compares them to featuresstored in the index table 5 412 e. In one embodiment, the index table 5412 e is a SQL database of character strings. The retriever 604 eidentifies any matching strings and returns the recognition results tothe result combiner 610 via signal line 644.

In one exemplary embodiment the three types of feature extractioninclude and invisible junction recognition algorithm, brick wall coding,and path coding.

The result combiner 610 receives recognition results from the pluralityof recognition units 410 a-e and produces one or a small list ofmatching results. In one embodiment, each of the recognition resultsincludes an associated confidence factor. In another embodiment, contextinformation such as date, time, location, personal profile, or retrievalhistory is provided to the result combiner 610. These confidence factorsalong with other information are used by the result combiner 610 toselect the recognition results most likely to match the input imagequery.

The above described embodiments are not meant to be exclusive orlimiting, and may be combined according to other embodiments. E.g., inother embodiments, the acquisition unit 406 has recognition unit 410 andindex tables 412 pairs partitioned in different manners, e.g., into oneor more higher priority indexed and one or more general indexes, includeat least one recognition unit 410 and index table 412 pair partitionedby mobile device 102 user, indexes partitioned by geographical location,and/or partitioned such that a recognition unit 410 and index tables 412pair is included on the mobile device 102.

Image Registration Unit 408

FIG. 7 shows an embodiment of the image registration unit 408. The imageregistration unit 408 comprises an image alteration generator 703, aplurality of Type 1 feature extractors 704 a-c, a plurality of Typelindex table updaters 706 a-c, a Type 2 feature extractor 708, a Type 2index table updater 710, a Type 3 feature extractor 712, a Type 3 indextable updater 714 and a plurality of master index tables 416 a-e. Theimage registration unit 408 also includes other control logic (notshown) that controls the updating of the working index tables 411-413from the master index table 416. The image registration unit 408 canupdate the index tables 411-413 of the acquisition unit 406 in a varietyof different ways based on various criteria such performing updates on aperiodic basis, performing updates when new content is added, performingupdates based on usage, performing updates for storage efficiency, etc.

The image alteration generator 703 has an input coupled in signal line730 to receive an image and a page identification number. The imagealteration generator 703 has a plurality of outputs and each output iscoupled by signal lines 732, 734, and 736 to invisible Type 1 extractors704 a-c, respectively. The image alteration generator 703 passes apristine image and the page identification number to the output andsignal line 732. The image alteration generator 703 then generates afirst altered image and outputs it and the page identification number onsignal line 734 to Type 1 feature extractor 704 b, and a second alteredimage, alter differently than the first altered image, and outputs itand page identification number on signal line 736 to Type 1 featureextractor 704 c.

The Type 1 feature extractors 704 receive the image and page ID, extractthe Type 1 features from the image and send them along with the page IDto a respective Type 1 index table updater 706. The outputs of theplurality of Type 1 feature extractors 704 a-c are coupled to input ofthe plurality of Type 1 index table updaters 706 a-c. For example, theoutput of Type 1 feature extractor 704 a is coupled to an input of Type1 index table updater 706 a. The remaining Type 1 feature extractors 704b-c similarly are coupled to respective Type 1 index table updaters 706b-c. The Type 1 index table updaters 706 are responsible for formattingthe extracted features and storing them in a corresponding master indextable 416. While the master index table 416 is shown as five separatemaster index tables 416 a-e, those skilled in the art will recognizethat all the master index tables could be combined into a single masterindex table or into a few master index tables. In the embodimentincluding the MMR publisher 108, once the Type 1 index table updaters706 have stored the extracted features in the index table 416, theyissue a confirmation signal that is sent via signal lines 740 and 136back to the MMR publisher 108.

The Type 2 feature extractor 708 and the Type 3 feature extractor 712operate in a similar fashion and are coupled to signal line 738 toreceive the image, a page identification number, and possibly otherimage information. The Type 2 feature extractor 708 extracts informationfrom the input needed to update its associated index table 416 d. TheType 2 index table updater 710 receives the extracted information fromthe Type 2 feature extractor 708 and stores it in the index table 416 d.The Type 3 feature extractor 712 and the Type 3 index table updater 714operate in a like manner but for Type 3's feature extraction algorithm.The Type 3 feature extractor 712 also receives the image, a page number,and possibly other image information via signal line 738. The Type 3feature extractor 712 extracts Type 3 information and passes it to theType 3 index table updater 714. The Type 3 index table updater 714stores the information in index table 5 416 e. The architecture of theregistration unit 408 is particularly advantageous because it providesan environment in which the index tables can be automatically updated,simply by providing images and page numbers to the image registrationunit 408. According to one embodiment, Type 1 feature extraction isinvisible junction recognition, Type 2 feature extraction is brick wallcoding, and Type 3 feature extraction is path coding.

Methods

FIG. 9 is a flowchart of an example method for generating and sending aretrieval request and processing the retrieval request with an MMRsystem 100. The method begins with the mobile device 102 capturing 902an image. A retrieval request that includes the image, a useridentifier, and other context information is generated by the mobiledevice 102 and sent 904 to the pre-processing server 103 or MMR gateway104. The pre-processing server 103 or MMR gateway 104 processes 906 theretrieval request by extracting the user identifier from the retrievalrequest and verifying that it is associated with a valid user. Thepre-processing server 103 or MMR gateway 104 also performs otherprocessing such as recording the retrieval request in the log 310,performing any necessary accounting associated with the retrievalrequest and analyzing any MMR analytics metrics. Next, thepre-processing server 103 or MMR gateway 104 generates 908 an imagequery and sends it to the dispatcher 402. The dispatcher 402 performsload-balancing and sends the image query to the acquisition unit 406. Inone embodiment, the dispatcher 402 specifies the particular recognitionunit 410 of the acquisition unit 406 that should process the imagequery. Then the acquisition unit 406 performs 912 image recognition toproduce recognition results. The recognition results are returned 914 tothe dispatcher 402 and in turn the pre-processing server 103 or MMRgateway 104. The recognition results are also used to retrieve 916hotspot data corresponding to the page and location identified in therecognition results. Finally, the hotspot data and the recognitionresults are sent 918 from the pre-processing server 103 or MMR gateway104 to the mobile device 102.

Image Tracking-Assisted MMR Recognition

In some instances, a mobile device 102 user will be interested inmultiple pieces of information from the same document page. For example,a user may move his mobile device 102 over a document page and submitmultiple images for recognition. FIG. 10 is a flowchart of an examplemethod of image tracking-assisted recognition. The method isparticularly advantageous because the addition of image trackingimproves the accuracy and speed of MMR recognition. Specifically, usingimage tracking, e.g., via 240 and/or 403, on the video stream of themobile device 102 allows for a determination of whether two frames existon the same page, and if so the relative positions of the content ofeach. The method begins with receiving 1002 an image, e.g. as capturedby mobile device 102. In addition, image tracking information isreceived 1004 indicating a location of the received image relative to alocation associated with a previously received image for whichrecognition has been performed. For example, such image trackinginformation may be provided by image tracker 240, alone or inconjunction with tracking manager 403. The method then proceedsaccording to one of two different paths. In one embodiment, and imagequery is formed and submitted 1006 corresponding to the received image,and a recognition result is received 1008 associated with the imagequery for the received image. Then, the recognition result associatedwith the previously received image is retrieved 1010. In this example,the tracking information received is verified 1012 by comparing the tworecognition results, e.g., by tracking manager 403. The process thenends for this path. This embodiment imparts additional accuracy to theprocess. Because more than one frame obtained from the same documentpage is submitted for recognition, even though the image trackinginformation indicates that they are from the same document page, therecognition results for the two images can be compared to the imagetracking information for verification. For example, tracking informationcould be calculated from the respective recognition results, which, ifinconsistent with the tracking information received from the imagetracker 240, could be used to update the image tracker information.

According to another embodiment, after receiving 1004 the image trackinginformation, and in response to the image tracking informationindicating that a document page includes the locations of both images,the previously received image recognition result is retrieved 1010. Inthis example, no image query is submitted corresponding to the receivedimage, since the tracking has indicated the received images on the samepage as the previously received image, and thus the recognition resultis the same. Therefore, if the first (previous) received image wassubmitted for recognition, it is unnecessary to submit the second image.

FIG. 11 is a flowchart showing an example method of recognition of aplurality of received images using a single image query. The methodbegins with receiving 1102 a plurality of images, e.g., from a singledocument page, captured by mobile device 102. For example, the imagesmay be taken from a video stream. In addition, image trackinginformation is received 1104 indicating relative locations of thereceived images on the document page, e.g., from image tracker 240,alone or in conjunction with tracking manager 403. The image trackinginformation also may include instructions for combining received images.From this information, a single image query is submitted 1106 for theplurality of images. This step 1106 includes additional sub-stepsaccording to various embodiments. The received images are combined 1108according to one embodiment to form a merged image, e.g., by imagetracker 240 and/or tracking manager 403. Information regarding how thereceived images are to be combined may be received from an image tracker240 on the mobile device 102 and/or a tracking manager 403. The mergedimage may include the combined image stitched together from theplurality of images, e.g. according to methods known in the art ofcreating panoramas. This method is advantageous because the merged imagecontains a larger area of the document, and thus provides more context,than any of the individual images received, resulting in improvedrecognition accuracy.

Alternatively, the merged image may be a super-resolution imagesynthesized from the combination of the plurality received imagesaccording to methods known in the art, e.g., performed by image tracker240 and/or tracking manager 403. This method is advantageous becausefeatures extracted from the higher resolution merged image more closelyresemble the features of the indexed high resolution image, thusimproving recognition accuracy. In one embodiment, the merge 1108 mayoccur on the mobile device 102. In another embodiment, the merge 1108occurs at the server (e.g., pre-processing server 103 or MMR gateway104, or MMR matching unit 106). In this example, tracking informationmay be sent from the image tracker 240 on the mobile device 102, from atracking manager 403 on the server 252 (e.g., using relative timinginformation received from the mobile device 102), or a combination ofthese. For example, the image tracker 240 on the mobile device 102 mayprovide sequence and timing information for a plurality of images, andthe tracking manager 403 may provide the image tracking information ascalculated therefrom. FIG. 12A shows an example of four received images1202 and the merged image 1204 formed from the combination of thereceived images 1202 according to the method of FIG. 11. From thecombined image, the single image query is formed 1110. That is, asynthetic patch is created by combining multiple image patches, uponwhich MMR recognition is performed.

Alternatively, multiple images received can be used to providerecognition results without combining them into a single image. Thus,according to another embodiment, a set of features is extracted 1112from each of the plurality of images. The sets of features then arecombined 1114 to form a single spatially-consistent superset, e.g., byimage tracker 240 and/or tracking manager 403. From the superset, thesingle image query is formed 1116. Features in overlapping portions ofthe images that are not reliable may be removed prior submission 1106 ofthe query. In addition, for multiple images received for a singlelocation, features for the same location on a document page may be usedto compute a set of consensus features that provide greater accuracy,and feature sets that are inconsistent with the consensus features maybe excluded from the superset. The combined features can be processed ina single image query because the positions of the images relative toeach other, as well as their rotation and scale relative to each otherare known. In this example, no overlap between the images is needed. Inone embodiment, the process ends when a recognition result is received1118 based upon the single image query, which is then provided to themobile device 102. In another embodiment, the mobile device 102 can actas a hand-held scanner, and thus when the recognition result is received1118 corresponding to the single image query, the result may be cropped1120, e.g., by tracking manager 403, based on the tracking informationindicating the relative locations of the received images on the documentpage, such that a high resolution image is provided in the shape of thecaptured images, i.e., as if it had been scanned by the mobile device102. FIG. 12B shows a cropped high resolution image 1406 resulting fromthe combination of received images 1202 according to a hand-held scannerfunctionality. In this example, the intersection points where theindividual images come together can be computed using the sides of thepolygons defined by the (x,y) values of the corners of the polygons.

MMR-Assisted Image Tracking

Just as image tracking can be used to improve MMR recognition results,MMR recognition performed periodically on a set of consecutive frames ofvideo stream can improve image tracking Specifically, drift is a knownproblem with image tracking of long sequences due to the cumulativenature of camera motion estimation. See, e.g., Kuhn (2003), referencedabove. Drift occurs as small errors that occur in the frame to framemotion estimation accumulate. FIG. 13 is a flowchart showing a method ofimproved image tracking using MMR. MMR recognition can be used todetermine absolute camera position in parallel with image tracking tocorrect for the cumulative drift, resulting in virtually drift-freetracking for long sequences. The method begins with receiving 1302 animage, and receiving 1304 image tracking information indicating alocation of the received image on a document page relative to a locationof a previously received image on the document page, e.g., from imagetracker 240. Using this information, an image query is then submitted1306 corresponding to the received image. A recognition result is thenreceived 1308 associated with the image query. From the result, andabsolute location of the received image on the document page can bedetermined 1310, e.g., by tracking manager 403. Then, the trackinginformation can be updated 1312 to reflect the absolute location of thereceived image on the document page. In another embodiment, some but notall of the received images are submitted as image queries. In thisembodiment, the absolute location is only updated for images that aresubmitted. In one embodiment, the method includes retrieving arecognition result for the previously received image, and from thatresult submitting the associated document ID, along with the imagetracking information, as the image query, wherein the absolute locationis returned as the result.

The forgoing description has been presented for the purposes ofillustration and description. It is not intended to be exhaustive or tolimit the techniques disclosed here to the precise form disclosed. Manymodifications and variations are possible in light of the aboveteaching. As will be understood by those familiar with the art, thetechniques disclosed here may be embodied in other specific formswithout departing from the spirit or essential characteristicsdisclosed. Likewise, the particular naming and division of the modules,routines, features, attributes, methodologies and other aspects are notmandatory or significant, and the mechanisms that implement thetechniques or their features may have different names, divisions and/orformats. Furthermore, as will be apparent to one of ordinary skill inthe relevant art, the modules, routines, features, attributes,methodologies and other aspects of the techniques can be implemented assoftware, hardware, firmware or any combination of the three. Also,wherever a component, an example of which is a module, is implemented assoftware, the component can be implemented as a standalone program, aspart of a larger program, as a plurality of separate programs, as astatically or dynamically linked library, as a kernel loadable module,as a device driver, and/or in every and any other way known now or inthe future to those of ordinary skill in the art of computerprogramming. Additionally, the techniques disclosed here are in no waylimited to implementation in any specific programming language, or forany specific operating system or environment. Accordingly, thedisclosure is intended to be illustrative, but not limiting, of thescope of the present invention, which is set forth in the followingclaims.

What is claimed is:
 1. A method comprising: receiving a plurality ofimages; combining the plurality of images into a merged image; forming asingle image query based upon the merged image; submitting the singleimage query corresponding to the plurality of received images; andreceiving a recognition result associated with the single image query.2. The method of claim 1, wherein combining the plurality of images intoa merged image comprises: receiving image tracking informationindicating relative locations of the received images to each other; andcombining the plurality of images based on the image trackinginformation.
 3. The method of claim 2, wherein the image trackinginformation comprises instructions for the combining.
 4. The method ofclaim 1, wherein the merged image is a combined image stitched togetherfrom the plurality of images.
 5. The method of claim 1, wherein themerged image is a super-resolution image synthesized from the pluralityof images.
 6. The method of claim 2, wherein the image trackinginformation includes sequence and relative timing information for theplurality of images.
 7. The method of claim 1, wherein submitting thesingle image query corresponding to the plurality of received imagesfurther comprises: extracting a set of features from each of theplurality of images; combining the extracted sets of features to form aspatially-consistent superset of features; and forming the single imagequery based upon the spatially-consistent superset of features.
 8. Themethod of claim 7, wherein features extracted from images for the samelocation are combined into consensus features.
 9. The method of claim 7,further comprising excluding from the superset extracted sets offeatures determined to be inconsistent with the consensus features. 10.The method of claim 2, further comprising cropping the recognitionresult based upon the image tracking information indicating the relativelocations of the received images.
 11. The method of claim 1, wherein thereceived plurality of images are from a video stream.
 12. A systemcomprising: one or more processors; and a memory coupled with the one ormore processors, the memory storing instructions, which when executed,cause the one or more processors to: receive a plurality of images;combine the plurality of images into a merged image; form a single imagequery based upon the merged image; submit the single image querycorresponding to the plurality of received images; and receive arecognition result associated with the single image query.
 13. Thesystem of claim 12, wherein to combine the plurality of images into amerged image, the instructions cause the one or more processors to:receive image tracking information indicating relative locations of thereceived images to each other; and combine the plurality of images basedon the image tracking information.
 14. The system of claim 12, whereinthe merged image is a super-resolution image synthesized from theplurality of images.
 15. The system of claim 12, wherein the mergedimage is a combined image stitched together from the plurality ofimages.
 16. The system of claim 12, wherein to submit the single imagequery corresponding to the plurality of received images, theinstructions cause the one or more processors to: extract a set offeatures from each of the plurality of images; combine the extractedsets of features to form a spatially-consistent superset of features;and form the single image query based upon the spatially-consistentsuperset of features.
 17. A computer program product comprising anon-transitory computer useable medium including a computer readableprogram, wherein the computer readable program, when executed on acomputer, causes the computer to: receive a plurality of images; combinethe plurality of images into a merged image; form a single image querybased upon the merged image; submit the single image query correspondingto the plurality of received images; and receive a recognition resultassociated with the single image query.
 18. The computer program productof claim 17, wherein to combine the plurality of images into a mergedimage, the computer readable program causes the computer to: receiveimage tracking information indicating relative locations of the receivedimages to each other; and combine the plurality of images based on theimage tracking information.
 19. The computer program product of claim17, wherein the merged image is a super-resolution image synthesizedfrom the plurality of images.
 20. The computer program product of claim17, wherein the merged image is a combined image stitched together fromthe plurality of images.